智能论文笔记

为了提高模型透明度并允许用户形成训练有素的ML模型的心理模型，解释对AI和机器学习（ML）社区的兴趣越来越高。但是，解释可以超越这种方式通信作为引起用户控制的机制，因为一旦用户理解，他们就可以提供反馈。本文的目的是介绍研究概述，其中解释与交互式功能相结合，是从头开始学习新模型并编辑和调试现有模型的手段。为此，我们绘制了最先进的概念图，根据其预期目的以及它们如何构建相互作用，突出它们之间的相似性和差异来分组相关方法。我们还讨论开放研究问题并概述可能的方向，希望促使人们对这个开花研究主题进行进一步的研究。

translated by 谷歌翻译

机器学习模型可能涉及决策边界，这些界限由于对规则和规则的更新而随时间而变化，例如在贷款批准或索赔管理中。然而，在这种情况下，可能需要足够的训练数据来累积时的时间，以便重新恢复模型以反映新的决策边界。虽然已经完成了加强现有决策边界的工作，但已经介绍了ML模型的决策边界应该改变的这些方案，以便反映新规则。在本文中，我们专注于用户提供的反馈规则作为加快ML模型更新过程的方式，我们正式介绍预处理训练数据的问题，以响应于反馈规则，使得模型一旦模型在预处理的数据上被培训，其决策边界与规则更紧密地对齐。为了解决这个问题，我们提出了一种新的数据增强方法，基于反馈规则的过采样技术。使用不同ML模型和现实世界数据集的广泛实验证明了该方法的有效性，特别是增强的好处和处理许多反馈规则的能力。

translated by 谷歌翻译

Federated Learning (FL) has become a key choice for distributed machine learning. Initially focused on centralized aggregation, recent works in FL have emphasized greater decentralization to adapt to the highly heterogeneous network edge. Among these, Hierarchical, Device-to-Device and Gossip Federated Learning (HFL, D2DFL \& GFL respectively) can be considered as foundational FL algorithms employing fundamental aggregation strategies. A number of FL algorithms were subsequently proposed employing multiple fundamental aggregation schemes jointly. Existing research, however, subjects the FL algorithms to varied conditions and gauges the performance of these algorithms mainly against Federated Averaging (FedAvg) only. This work consolidates the FL landscape and offers an objective analysis of the major FL algorithms through a comprehensive cross-evaluation for a wide range of operating conditions. In addition to the three foundational FL algorithms, this work also analyzes six derived algorithms. To enable a uniform assessment, a multi-FL framework named FLAGS: Federated Learning AlGorithms Simulation has been developed for rapid configuration of multiple FL algorithms. Our experiments indicate that fully decentralized FL algorithms achieve comparable accuracy under multiple operating conditions, including asynchronous aggregation and the presence of stragglers. Furthermore, decentralized FL can also operate in noisy environments and with a comparably higher local update rate. However, the impact of extremely skewed data distributions on decentralized FL is much more adverse than on centralized variants. The results indicate that it may not be necessary to restrict the devices to a single FL algorithm; rather, multi-FL nodes may operate with greater efficiency.

translated by 谷歌翻译

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

Meryem Banu Cavlak , Gagandeep Singh , Mohammed Alser , Can Firtina , Joël Lindegger , Mohammad Sadrosadati , Nika Mansouri Ghiasi , Can Alkan , Onur Mutlu

分类：人工智能 | 机器学习

2022-12-09

Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first fast and widely-applicable pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall's key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. TargetCall filters out all off-target reads before basecalling; and the highly-accurate but slow basecalling is performed only on the raw signals whose noisy reads are labeled as on-target. Our thorough experimental evaluations using both real and simulated data show that TargetCall 1) improves the end-to-end basecalling performance of the state-of-the-art basecaller by 3.31x while maintaining high (98.88%) sensitivity in keeping on-target reads, 2) maintains high accuracy in downstream analysis, 3) precisely filters out up to 94.71% of off-target reads, and 4) achieves better performance, sensitivity, and generality compared to prior works. We freely open-source TargetCall at https://github.com/CMU-SAFARI/TargetCall.

translated by 谷歌翻译